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Abstract 

The  researchers  made  significant  progress  in  all  of  the  proposed  research  areas.  The  first  major 
task  in  the  proposal  involved  duality  in  stochastic  control  and  optimal  stopping.  In  support  of  this 
task,  the  researchers  developed  new  methods  for  efficiently  solving  optimal  stopping  problems  of 
partially  observable  Markov  processes  and  optimal  stopping  problems  under  jump-diffusion  pro¬ 
cesses.  The  researchers  also  studied  the  information  relaxation  approach  and  established  duality 
for  controlled  Markov  diffusions  and  weakly  coupled  dynamic  programs.  In  the  second  major  task 
aiming  at  solving  difficult  global  optimization  problems,  the  researchers  proposed  and  developed 
a  new  framework  that  integrates  the  idea  of  model-based  randomized  optimization  with  gradient- 
based  optimization,  and  further  extended  this  method  to  simulation  optimization  problems.  All  the 
developed  methods  have  been  tested  through  numerical  experiments  and  demonstrated  excellent 
performance.  The  methods  have  also  been  applied  to  problems  in  revenue  management,  option 
pricing,  and  power  allocation  in  communication  networks. 


A-l 

DISTRIBUTION  A:  Distribution  approved  for  public  release. 


1  Introduction 


In  this  research  project,  we  carried  out  research  concerning  with  the  study  of  basic  questions 
aimed  at  meeting  challenges  in  information  superiority,  logistics,  and  planning  for  the  Air  Force 
of  the  future.  For  successful  military  operations,  the  future  requirements  of  the  Air  Force  will 
include  information  fusion  at  a  much  larger  scale  and  much  more  agile,  responsive,  and  integrated 
systems.  Such  problems  and  systems  are  exceedingly  complex;  however,  a  central  part  of  them 
is  decision  making,  which  often  takes  place  sequentially  in  time,  subject  to  uncertainty  in  the 
future  and  limited  partial  information  at  hand.  In  order  to  address  these  problems,  we  investigated 
efficient  computational  methodologies  for  dynamic  decision  making  under  uncertainty  and  partial 
information.  In  the  course  of  this  research,  we  proposed  to  (i)  develop  and  study  efficient  simulation- 
based  methodologies  for  dynamic  decision  making  under  uncertainty  and  partial  information;  (ii) 
study  the  application  of  these  decision  making  models  and  methodologies  to  practical  problems, 
such  as  those  arising  in  planning,  logistics,  and  risk  management.  The  completed  research  resulted 
in  (i)  new  mathematical  tools  and  theories  for  dynamic  decision  making  and  optimal  stopping;  (ii) 
powerful  algorithms  for  solving  problems  that  currently  cannot  be  tackled  by  existing  methods;  (iii) 
useful  application  of  these  models  and  new  methodologies  in  a  wide  range  of  problems.  Our  work 
can  be  effective  tools  in  an  integrated  approach  to  Global  Awareness  (Intelligence,  Surveillance  and 
Reconnaissance,  or  ISR),  Command  and  Control  (C2),  planning,  and  logistics  for  the  Air  Force. 

In  particular,  we  combined  three  approaches  in  our  research: 

•  developing  efficient  simulation-based  methodologies  that  have  a  sound  theoretical  basis; 

•  providing  convergence  guarantees  and  error  bounds  for  the  developed  algorithms; 

•  studying  the  application  of  these  decision  making  models  and  methodologies  to  practical 
problems,  such  as  those  arising  in  revenue  management,  risk,  and  communication  networks. 

1.1  Duality  in  Stochastic  Control  and  Optimal  Stopping 

Stochastic  control  and  optimal  stopping  provide  a  powerful  paradigm  for  modelling  sequential 
optimization  under  uncertainty.  The  typical  models  include  Markov  decision  processes  (MDPs), 
controlled  Markov  diffusion,  optimal  stopping  under  full  or  partial  observation.  To  be  more  specific 
about  the  problems  we  have  considered,  we  describe  the  MDP  below,  and  the  other  models  are 
formulated  similarly.  Consider  a  finite-horizon  MDP  on  the  probability  space  (fi,C?,P).  Time  is 
indexed  by  /C  =  {0, 1,  •  •  •  ,  K }.  Suppose  X  is  the  state  space  and  A  is  the  action  space.  The  state 
{xk}  follows  the  equation 


■i' k -  i  f  (x/~,  Ofc,  u/j-i-i),  k  0, 1,  ■  ■  ■  ,  K  1, 


(1) 


where  £  Ak  is  the  action  whose  value  is  decided  at  time  k ,  and  v k  is  a  random  variable  with 
a  known  distribution  taking  values  in  the  set  V.  The  evolution  of  the  information  is  described  by 
the  filtration  G  =  {Go,"'  ,Gk}  with  Q  =  Qk ■  hi  particular,  each  v k  is  (^-adapted.  Denote  by 
A  the  set  of  all  policies  a  =  (a0, . . .  ;  clk-i),  he.,  each  ak  takes  value  in  A.  Let  A^  be  the  set  of 
non-anticipative  policies  that  are  adapted  to  the  filtration  G,  i.e. ,  each  is  ^-adapted.  Given  an 
xo  £  X,  the  objective  is  to  maximize  the  expected  reward  by  selecting  a  non-anticipative  policy 
a  £  A(g: 

r  K—i  i 


Vq{xq) 


sup  E  ^2  9k(xk ,  afc)  +  A(xk)\x0 
aeAc  fc=o 


(2) 
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The  expectation  in  ([2])  is  taken  with  respect  to  the  random  sequence  v  =  (iq,  •••  ,vk)- 

Numerical  solutions  to  stochastic  control  and  optimal  stopping  problems  often  suffer  from  the 
so-called  “curse  of  dimensionality”,  i.e. ,  the  computational  complexity  grows  exponentially  as  the 
dimension  of  the  state  increases.  To  overcome  this  difficulty,  a  variety  of  approximate  dynamic 
programming  techniques  have  been  developed,  see  PESE].  These  methods  often  generate  sub- 
optimal  policies,  which  lead  to  lower  bounds  on  the  optimal  expected  reward  by  simulating  the 
dynamic  system  under  the  aforementioned  policies.  Though  the  accuracy  of  the  sub-optimal  policies 
is  generally  unknown,  the  lack  of  performance  guarantee  on  sub-optimal  policies  can  be  potentially 
addressed  by  providing  an  upper  bound  on  the  optimal  expected  reward.  Towards  this  end,  recently 
researchers  have  spent  a  significant  amount  of  effort  in  studying  the  duality  in  stochastic  control 
and  optimal  stopping.  The  main  idea  of  this  dual  approach  is  to  allow  the  decision  maker  to  foresee 
the  future  uncertainty  but  impose  a  penalty  for  getting  access  to  the  information  in  advance.  The 
optimal  expected  reward  can  be  achieved  by  imposing  an  optimal  penalty,  which  is  not  directly 
available  in  practice.  Hence,  approximation  schemes  based  on  different  types  of  optimal  penalties 
have  been  proposed  in  order  to  derive  tight  dual  bounds,  such  as  [27]  and  0.. 

The  PI  and  her  collaborators’  contribution  includes:  (i)  a  filtering-based  duality  approach  for 
solving  optimal  stopping  problems  under  partial  observation;  (ii)  a  parameterized  duality  approach 
for  solving  MDPs;  (iii)  development  of  duality  theory  in  controlled  Markov  diffusions;  (iv)  an 
approach  combining  information  relaxation  with  Lagrangian  relaxation  to  solve  large-scale  weakly 
coupled  dynamic  programs;  (v  )  highly  efficient  algorithm  for  solving  dual  problem  of  dynamic 
programming. 

1.2  Model-based  Global  Optimization 

In  dynamic  decision  making  and  control,  one-stage  optimization  often  comes  up  as  a  subproblem. 
Hence,  we  considered  the  following  optimization  problem 

x*  G  arg  max  H(x),  X  C  Mn.  (3) 

x&X 

where  the  solution  space  ft  is  a  nonempty  compact  set  in  Mn,  and  H  :  X  — >  M  is  a  real- valued 
function.  Denote  the  optimal  function  value  as  H* ,  i.e.,  there  exists  an  x*  such  that  H(x)  < 
H*  =  H(x*),  \/x  G  X.  Assume  that  H  is  bounded  on  X ,  i.e.,  3 Ha,  >  —00,  Hub  <  00  s.t. 
Hib  <  H(x)  <  Hub,  Vx  G  X. 

Model-based  global  optimization  methods  use  probability  distributions  to  weight  promising 
areas  of  the  solution  space,  where  the  distribution  is  updated  iteratively  based  on  output  from  the 
samples  drawn  according  to  the  current  distribution.  They  are  well  suited  for  global  optimization 
problems  where  there  is  limited  structural  information  on  the  optimizing  function  (e.g.,  derivatives, 
convexity) . 

The  PI  and  her  collaborators’  contribution  includes:  (i)  the  introduction  of  the  new  idea  of 
incorporating  model-based  optimization  into  direct  gradient  search;  (ii)  development  of  a  rigorous 
algorithm  framework  of  gradient-based  adaptive  stochastic  search;  (iii)  development  of  a  series  al¬ 
gorithms  for  black-box  optimization,  simulation  optimization  problems,  and  release  of  user-friendly 
software. 
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2  Research  Results 


2.1  Duality  in  Stochastic  Control  and  Optimal  Stopping 

Building  upon  the  recently-developed  duality  theory  in  stochastic  dynamic  programs,  we  tackled 
stochastic  control  and  optimal  stopping  problems  (either  with  full  observation  or  partial  observa¬ 
tion)  from  three  angles:  (i)  we  developed  the  duality  theory  and  proposed  a  parameterized  penalty 
framework  in  the  dual  formulation  based  on  the  new  insights  we  gained  into  the  structure  of  optimal 
penalty  functions;  (ii)  we  incorporated  advanced  Monte  Carlo  techniques  to  develop  highly  efficient 
algorithms  in  evaluating  the  dual  formulation;  (iii)  we  studied  the  numerical  performance  of  our 
algorithms  on  a  wide  range  of  application  problems  in  revenue  and  risk  management. 

In  [31]  [33]  we  focused  on  optimal  stopping  under  partial  observation  and  developed  a  lower-and- 
upper-bound  approach  with  moderate  computational  cost.  The  motivation  is  that  the  gap  between 
the  lower  and  upper  bounds  gives  an  indication  of  the  quality  of  the  approximate  solutions.  To 
guarantee  a  high-quality  approximate  solution,  we  can  increase  the  computation  effort  until  the 
gap  between  the  two  bounds  decreases  to  a  desirable  tolerance  level.  We  proposed  a  filtering- 
based  duality  approach  that  complements  a  suboptimal  stopping  time  (hence  an  asymptotic  lower 
bound)  with  an  asymptotic  upper  bound  on  the  value  function.  Since  our  approach  does  not  tie  to 
a  particular  model  and  only  involves  Monte  Carlo  simulation,  it  can  be  generalized  to  any  POMP  as 
long  as  the  particle  filtering  technique  can  be  applied.  Our  method  relies  on  the  martingale  duality 
formulation  of  the  fully  observable  optimal  stopping  problem,  which  is  proposed  by  [26]  and  |13] 
in  the  setting  of  pricing  American  options  under  constant  volatility.  The  numerical  method  in 
m  generates  the  suboptimal  martingale  based  on  approximate  value  functions;  whereas  later 
P-]  developed  an  alternative  duality-based  method  that  uses  a  suboptimal  policy  to  generate  the 
martingale.  From  the  perspective  of  modeling  fidelity  versus  computational  complexity,  it  is  not 
trivial  to  compare  optimal  stopping  of  POMPs  with  its  counterpart  in  fully  observable  Markov 
processes.  In  particular,  the  difference  of  their  value  functions  cannot  be  quantified  in  general  and 
is  problem  dependent,  so  we  were  also  interested  in  learning  the  features  that  influence  this  difference 
in  the  underlying  probabilistic  model.  Indeed,  as  an  example,  our  numerical  experiments  on  pricing 
American  options  under  partially  observable  stochastic  volatility  showed  that  our  asymptotic  upper 
bound  is  strictly  less  than  the  option  price  of  the  model  where  the  volatility  is  treated  directly 
observable,  and  the  difference  is  especially  obvious  when  the  effect  of  the  volatility  is  dominant. 
This  in  turn  showed  that  our  method  provides  a  better  criterion  to  evaluate  the  performance  of  a 
suboptimal  policy  in  the  partially  observable  model. 

In  [32],  we  noted  that  the  construction  of  a  good  penalty  in  the  dual  formulation  of  MDPs 
is  usually  difficult  due  to  the  tradeoff  between  its  effectiveness  and  the  computational  cost.  As 
a  consequence,  all  penalties  developed  so  far  for  continuous-state  MDPs  are  linear  functions  for 
the  sake  of  maintaining  the  solvability  of  the  intermediate  pathwise  optimization  problem.  To 
capture  the  nonlinear  feature  of  the  optimal  penalty  in  a  general  case,  we  introduced  a  class  of 
simple  nonlinear  penalty  functions  that  can  be  applied  to  general  MDPs.  This  class  of  nonlinear 
penalties  together  with  other  classes  of  linear  penalties  developed  in  [4]  and  [f0|,  all  lead  to  dual 
bounds  on  the  value  function.  In  summary,  our  contributions  are:  (i)  we  developed  a  framework 
of  parameterized  penalties  in  the  dual  representation  of  MDPs,  where  the  optimal  choice  of  the 
parameters  can  be  determined  by  a  convex  (stochastic)  optimization  problem.  The  theoretic  result 
guarantees  a  tighter  dual  bound  if  more  penalties  are  used;  (ii)  we  introduced  a  new  class  of 
nonlinear  penalties  that  can  be  applied  to  general  MDPs  and  are  also  very  easy  to  implement  in 
practice;  (iii)  we  carried  out  some  numerical  experiments  that  provide  insights  into  the  design  and 
choice  of  penalties.  The  numerical  results  showed  a  considerable  improvement  on  the  tightness  of 
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the  dual  bound  using  our  parameterized  penalties. 

In  [31],  we  extended  the  information  relaxation  approach  and  the  dual  representation  of  MDPs 
to  controlled  Markov  diffusions.  The  motivation  is  that  the  Hamilton- Jacobi-Bellman  (HJB)  equa¬ 
tion  rarely  allows  a  closed-form  value  function,  especially  when  the  dimension  of  the  state  space  is 
high  or  there  are  constraints  on  the  control  space.  Many  numerical  methods  have  been  developed 
based  on  different  approximation  schemes:  m  considered  the  Markov  chain  approximation  method 
by  discretizing  the  HJB  equation;  |I2J  extended  the  approximate  linear  programming  method  to 
controlled  Markov  diffusions.  Another  standard  numerical  approach  is  to  discretize  the  time  space, 
which  reduces  the  original  continuous-time  problem  to  an  MDP  and  hence  the  techniques  of  ap¬ 
proximate  dynamic  programming  can  be  applied.  Since  the  quality  of  a  numerical  solution  is  hard 
to  justify  in  many  problems,  we  were  interested  in  deriving  a  tight  dual  bound  on  the  value  function 
of  a  controlled  Markov  diffusion  by  formulating  its  dual  representation.  To  see  if  it  is  possible  to 
establish  a  similar  framework  of  dual  formulation  for  controlled  Markov  diffusions  based  on  informa¬ 
tion  relaxation  as  that  for  MDPs,  we  presented  the  information  relaxation-based  dual  formulation 
of  controlled  Markov  diffusions  based  on  the  technical  machinery  “anticipating  stochastic  calculus” 
(see,  e.g.,  [SI  [23]).  We  further  established  the  weak  duality,  strong  duality  and  complementary 
slackness  results  in  a  parallel  way  as  those  in  the  dual  formulation  of  MDPs.  We  investigated 
one  type  of  optimal  penalties,  i.e.,  the  so-called  “value  function-based  penalty”,  which  has  the  key 
feature  that  it  can  be  written  compactly  as  an  Ito  stochastic  integral  under  the  natural  filtration 
generated  by  the  Brownian  motions.  This  compact  expression  potentially  enables  us  to  design  sub- 
optimal  penalties  in  simple  forms  and  also  facilitates  the  computation  of  the  dual  bound.  Then  we 
emphasized  on  the  computational  aspect  using  the  value  function-based  optimal  penalty  so  as  to 
answer  the  third  question.  A  direct  application  is  illustrated  by  a  classic  dynamic  portfolio  choice 
problem  with  predictable  returns  and  intermediate  consumptions:  we  considered  the  numerical  so¬ 
lution  to  a  discrete-time  model  that  is  discretized  from  a  continuous-time  model;  an  effective  class 
of  penalties  that  are  easy  to  evaluate  is  proposed  to  derive  dual  bounds  on  the  value  function  for 
the  discrete-time  model.  In  addition,  we  found  that  the  Lagrangian  approach  proposed  early  by  [9| 
has  a  similar  flavor  as  the  gradient-based  penalty  proposed  by  [2]  for  MDPs.  The  main  difference  of 
their  work  compared  with  ours  is  that  we  propose  a  more  general  framework  that  may  incorporate 
their  Lagrangian  approach  as  a  special  case;  the  optimal  penalty  we  developed  in  this  work  is  value 
function-based  while  their  Lagrangian  approach  behaves  like  a  gradient-based  penalty.  In  summary, 
our  contributions  include:  (i)  we  established  a  dual  representation  of  controlled  Markov  diffusions 
based  on  information  relaxation;  (ii)  we  also  explored  the  structure  of  the  optimal  penalty  and 
exposed  the  connection  between  MDPs  and  controlled  Markov  diffusions;  (iii)  based  on  the  result 
of  the  dual  representation  of  controlled  Markov  diffusions,  we  demonstrated  its  practical  use  in  a 
dynamic  portfolio  choice  problem.  In  many  cases  the  numerical  results  of  the  upper  bounds  on 
the  expected  utility  show  that  our  proposed  penalties  are  near  optimal,  comparing  with  the  lower 
bounds  induced  by  sub-optimal  policies  for  the  same  problem. 

hi  mm,  we  considered  pricing  American-style  derivatives,  which  is  essentially  an  optimal 
stopping  problem.  There  has  been  an  active  and  challenging  problem  in  the  last  thirty  years, 
especially  when  the  underlying  stocks’  prices  follow  some  jump-diffusion  processes,  as  they  become 
more  and  more  critical  to  investors.  To  present  time,  various  jump-diffusion  models  for  financial 
modelling  have  been  proposed  to  fit  the  real  data  in  financial  markets,  including  the  normal  jump- 
diffusion  model,  the  affine  jump-diffusion  models,  the  jump  models  based  on  Levy  processes,  the 
double  exponential,  mixed-exponential  and  hyper-exponential  jump-diffusion  models.  All  these 
models  are  trying  to  capture  some  interesting  features  of  the  market  behaviour  that  cannot  be 
well  explained  by  pure- diffusion  models,  such  as  the  heavy-tail  risk  suffered  by  the  market.  In 
general,  closed-form  expressions  for  the  American-style  derivatives  can  hardly  be  derived  under 
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these  jump- diffusion  models  due  to  the  multiple  exercise  opportunities  and  the  randomness  in  the 
underlying  asset  price  caused  by  both  jumps  and  diffusions.  Hence,  various  numerical  methods 
have  been  proposed  to  tackle  the  American-style  option  pricing  problems  under  the  jump-diffusion 
models.  We  generalized  the  idea  of  “true  martingale”  in  [2]  to  Bermudan  option  pricing  problem. 
We  proved  that  the  true  martingale  approximation  converges  to  the  objective  martingale  in  the 
mean  square  sense  provided  that  the  time  discretization  goes  to  zero  by  bounding  the  empirical 
difference  between  the  approximation  and  the  objective  martingale.  In  numerical  experiments,  we 
investigated  the  numerical  effectiveness  of  the  least-squares  regression  approach  (L-S  algorithm) 
m  for  Bermudan  option  price  under  the  jump-diffusion  models.  In  particular,  we  found  that  by 
incorporating  the  European  option  price  under  the  corresponding  pure- diffusion  model  (referred 
as  the  “non-jump  European  option”)  into  the  function  basis  of  the  L-S  algorithm,  the  quality  of 
the  induced  suboptimal  exercise  strategies  and  the  lower  bounds  can  be  significantly  improved. 
Motivated  by  the  explicit  structure  of  the  optimal  dual  martingale,  we  proposed  a  function  basis 
that  can  be  employed  in  the  T-M  algorithm  to  obtain  tight  upper  bounds  on  the  option  price. 
By  implementing  our  algorithm  and  the  A-B  algorithm  on  several  sets  of  numerical  experiments, 
the  numerical  results  demonstrate  that  both  methods  can  generate  tight  and  stable  upper  bounds 
on  option  price;  however,  we  observed  that  our  algorithm  is  much  more  efficient  than  the  A-B 
algorithm  in  practice  due  to  the  relief  from  nested  simulation. 

In  [35].  we  studied  the  weakly  coupled  dynamic  program,  which  describes  a  broad  class  of 
stochastic  optimization  problems  in  which  multiple  controlled  stochastic  processes  evolve  indepen¬ 
dently  but  subject  to  a  set  of  linking  constraints  imposed  on  the  controls.  One  feature  of  the 
weakly  coupled  dynamic  program  is  that  it  decouples  into  lower  dimensional  dynamic  programs  by 
dualizing  the  linking  constraint  via  the  Lagrangian  relaxation,  which  yields  a  bound  on  the  optimal 
value  of  the  original  dynamic  program.  Together  with  the  Lagrangian  bound,  we  generalized  the 
information  relaxation  approach  that  relaxes  the  non-anticipative  constraint  on  the  controls  to  ob¬ 
tain  a  tighter  dual  bound.  To  tackle  large-scale  problems,  we  further  proposed  a  computationally 
tractable  method  based  on  information  relaxation,  and  showed  it  provides  a  valid  dual  bound  and 
its  performance  has  a  uniform  bound  regardless  of  the  number  of  subproblems.  We  implemented  our 
method  and  demonstrate  its  use  on  a  dynamic  product  promotion  problem  and  a  linear  quadratic 
control  problem  with  nonconvex  linking  constraints. 

In  [41],  we  developed  a  framework  of  regression  approach  to  approximating  the  optimal  dual 
penalty  in  a  non-nested  manner,  by  exploring  the  structure  of  the  function  space  consisting  of  all 
feasible  dual  penalties.  The  resulted  approximations  maintain  to  be  feasible  dual  penalties,  and  thus 
yield  valid  dual  bounds  on  the  optimal  value  function.  We  showed  that  the  proposed  framework 
is  computationally  efficient,  and  the  resulted  dual  penalties  lead  to  numerically  tractable  dual 
problems.  Finally,  we  applied  the  framework  to  a  high-dimensional  dynamic  trading  problem  to 
demonstrate  its  effectiveness  in  solving  the  dual  problems  of  complex  dynamic  programs. 

2.2  Gradient-based  Adaptive  Stochastic  Search 

We  distinguish  between  instance-based  and  model-based  global  optimization  solution  methods.  In 
instance-based  methods,  the  search  for  new  candidate  solutions  depends  directly  on  previously 
generated  solutions,  e.g.,  simulated  annealing,  genetic  algorithms  (GAs),  tabu  search,  and  nested 
partitions.  On  the  other  hand,  model-based  algorithms  typically  assume  a  sampling  distribution 
(i.e. ,  a  probabilistic  model),  often  within  a  parameterized  family  of  distributions,  over  the  solution 
space,  and  iteratively  carry  out  the  two  interrelated  steps:  (1)  draw  candidate  solutions  from  the 
sampling  distribution;  (2)  use  the  evaluations  of  these  candidate  solutions  to  update  the  sampling 
distribution.  The  hope  is  that  at  every  iteration  the  sampling  distribution  is  biased  towards  the 
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more  promising  regions  of  the  solution  space,  and  will  eventually  concentrate  on  one  or  more 
of  the  optimal  solutions.  Examples  of  model-based  algorithms  include  ant  colony  optimization 
HU,  annealing  adaptive  search  (AAS)  [28],  probability  collectives  (PCs)  [30],  the  estimation  of 
distribution  algorithms  (ED As)  |17|.  the  cross-entropy  (CE)  method  [29],  model  reference  adaptive 
search  (MRAS)  [U],  and  the  interacting-particle  algorithm  [21,  122],  The  various  model-based 
algorithms  mainly  differ  in  their  ways  of  updating  the  sampling  distribution.  Because  model-based 
algorithms  work  with  a  population  of  candidate  solutions  at  each  iteration,  they  demonstrate  more 
robustness  in  exploring  the  solution  space  as  compared  with  their  classical  counterparts  that  work 
with  a  single  candidate  solution  each  time  (e.g.,  simulated  annealing),  have  found  widespread 
applications  in  solving  hard  nonlinear  optimization  problems  and  have  the  potential  of  being  useful 
tools  in  a  number  of  areas  related  to  estimation  and  control  mmmm- 

In  [36j[38j,  we  proposed  a  gradient-based  adaptive  stochastic  search  (GASS)  framework  for 
solving  general  optimization  problems  with  little  structure.  The  main  motivation  is  to  integrate 
this  robustness  feature  of  model-based  algorithms  into  familiar  gradient-based  tools  from  classical 
differentiable  optimization  to  facilitate  the  search  for  good  sampling  distributions.  The  underlying 
idea  is  to  reformulate  the  original  (possibly  non-differentiable)  optimization  problem  into  a  differen¬ 
tiable  optimization  problem  over  the  parameter  space  of  the  sampling  distribution,  and  then  use  a 
direct  gradient  search  method  on  the  parameter  space  to  solve  the  new  formulation.  This  leads  to  a 
natural  algorithmic  framework  that  combines  the  advantages  of  both  methods:  the  fast  convergence 
of  gradient-based  methods  and  the  global  exploration  of  stochastic  search.  Specifically,  each  itera¬ 
tion  of  our  proposed  method  consists  of  the  following  two  steps:  (1)  generate  candidate  solutions 
from  the  current  sampling  distribution;  (2)  update  the  parameters  of  the  sampling  distribution 
using  a  direct  gradient  search  method.  Although  there  are  a  variety  of  gradient-based  algorithms 
that  are  applicable  in  step  (2)  above,  we  focused  on  a  particular  algorithm  that  uses  a  Newton-like 
procedure  to  update  the  sampling  distribution  parameters.  Note  that  since  the  algorithm  uses 
only  the  information  contained  in  the  sampled  solutions,  it  differs  from  the  Newton-like  method  in 
deterministic  optimization,  in  that  there  is  an  extra  Monte  Carlo  sampling  noise  involved  at  each 
parameter  updating  step.  We  showed  that  this  stochastic  version  of  Newton-like  iteration  can  be 
expressed  in  the  form  of  a  generalized  Robbins-Monro  algorithm,  and  this  in  turn  allowed  us  to 
use  the  existing  tools  from  stochastic  approximation  theory  to  analyze  the  asymptotic  convergence 
and  convergence  rate  of  the  proposed  algorithm.  The  algorithm  iteratively  finds  high  quality  solu¬ 
tions  by  randomly  sampling  candidate  solutions  from  a  parameterized  distribution  model  over  the 
solution  space,  combining  the  robustness  feature  of  stochastic  search  from  considering  a  population 
of  candidate  solutions  with  the  relative  fast  convergence  speed  of  classical  gradient  methods  by 
exploiting  local  differentiable  structures.  We  analyzed  the  convergence  and  converge  rate  proper¬ 
ties  of  the  proposed  algorithm,  and  carried  out  numerical  study  to  illustrate  its  performance.  In 
summary,  the  major  contributions  of  this  work  includes  (1)  establish  an  algorithm  framework  that 
integrates  the  central  idea  of  model-based  optimization  with  direct  gradient  search;  (2)  develop  a 
series  of  algorithms  for  non-differentiable  or  even  black-box  optimization. 

In  [7],  under  the  framework  of  GASS,  we  proposed  two  discrete  optimization  algorithms:  discrete 
gradient-based  adaptive  stochastic  search  (discrete-GASS)  and  annealing  gradient-based  adaptive 
stochastic  search  (annealing-GASS).  In  discrete-GASS,  we  transformed  the  discrete  optimization 
problem  to  a  continuous  optimization  problem  on  the  parameter  space  of  a  family  of  independent 
discrete  distributions,  and  applied  a  gradient-based  method  to  find  the  optimal  parameter  such  that 
the  corresponding  distribution  has  the  best  capability  to  generate  optimal  solution(s)  to  the  orig¬ 
inal  discrete  problem.  In  annealing-GASS,  we  used  Boltzmann  distribution  as  the  parameterized 
probabilistic  model,  and  propose  a  gradient-based  temperature  schedule  which  changes  adaptively 
with  respect  to  the  current  performance  of  the  algorithm.  We  proved  the  convergence  of  the  two 
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proposed  methods,  and  conduct  numerical  experiments  to  compare  these  two  methods  as  well  as 
some  other  existing  methods.  More  specifically,  the  first  algorithm  (discrete-GASS)  uses  the  inde¬ 
pendent  discrete  distribution  as  the  parameterized  distribution,  where  the  parameters  determine 
the  probabilities  that  the  components  of  a  candidate  solution  will  be  selected.  By  introducing  this 
probabilistic  model,  instead  of  directly  solving  the  solutions  on  the  discrete  space,  we  converted  the 
discrete  optimization  problem  to  a  continuous  problem  on  the  parameter  space,  and  hence  we  can 
apply  a  gradient-based  method  on  the  parameter  space  to  find  the  optimal  probabilistic  model.  In 
this  algorithm,  we  can  easily  generate  samples  according  to  the  parameterized  distribution;  how¬ 
ever,  the  dimensionality  of  the  transformed  problem  on  the  parameter  space  is  large  compared  to 
the  original  problem.  Therefore,  we  proposed  the  second  algorithm  (annealing-GASS),  which  uses 
the  Boltzmann  distribution  as  the  probabilistic  model  that  only  has  a  scalar  parameter  known 
as  the  temperature.  The  dimension  of  the  parameter  is  only  one  regardless  of  the  dimension  or 
cardinality  of  the  solution  space.  Different  from  discrete-GASS,  where  the  optimal  parameter  is 
unknown  and  needs  to  be  solved  by  a  gradient-based  method,  the  optimal  parameter  (temperature) 
in  annealing-GASS  is  actually  well  known  to  be  zero;  however,  generating  samples  exactly  from  a 
sequence  of  Boltzmann  distributions  with  time-dependent  temperatures  and  choosing  an  appropri¬ 
ate  temperature  schedule  are  difficult  and  challenging  tasks  in  the  implementation.  Various  choices 
of  temperature  schedules  have  been  studied  and  tested  in  the  literature  [12,  23,  27,  5,  9,  25].  We 
derived  a  temperature  schedule  under  the  GASS  framework  by  using  a  gradient-based  method  to 
solve  the  reformulated  objective  function  on  the  parameter  and  adaptively  updating  the  parameter 
based  on  the  current  performance  of  the  algorithm.  In  summary,  the  major  contributions  of  this 
work  includes  (1)  extension  of  gradient-based  adaptive  stochastic  search  to  discrete  optimization 
with  convergence  results;  (2)  a  new  adaptive  temperature  schedule  for  a  converging  sequence  of 
Boltzmann  distributions. 

In  {15] ,  we  aimed  to  improve  the  efficiency  of  model-based  algorithms  by  reducing  the  number  of 
candidate  solutions  generated  per  iteration.  This  was  carried  out  through  embedding  a  stochastic 
averaging  procedure  within  these  methods  to  make  more  efficient  use  of  the  past  sampling  informa¬ 
tion.  This  procedure  not  only  can  potentially  reduce  the  number  of  function  evaluations  needed  to 
obtain  high-quality  solutions,  but  also  makes  the  underlying  algorithms  more  amenable  for  parallel 
computation.  The  detailed  implementation  of  our  approach  is  demonstrated  through  an  exemplary 
algorithm  instantiation  called  Model-based  Annealing  Random  Search  with  Stochastic  Averaging 
(MARS-SA),  which  maintains  the  per  iteration  sample  size  at  a  small  constant  value.  We  estab¬ 
lished  the  global  convergence  property  of  MARS-SA  and  provided  numerical  examples  to  illustrate 
its  performance. 

In  {37]  ,  we  extended  the  idea  of  model-based  algorithms  for  deterministic  optimization  to  simula¬ 
tion  optimization  over  continuous  space.  Model-based  algorithms  iteratively  generate  a  population 
of  candidate  solutions  from  a  sampling  distribution  and  use  the  performance  of  the  candidate  solu¬ 
tions  to  update  the  sampling  distribution.  By  viewing  the  original  simulation  optimization  problem 
as  another  optimization  problem  over  the  parameter  space  of  the  sampling  distribution,  we  proposed 
to  use  a  direct  gradient  search  on  the  parameter  space  to  update  the  sampling  distribution.  To 
improve  the  computational  efficiency,  we  further  developed  a  two-timescale  updating  scheme  that 
updates  the  parameter  on  a  slow  timescale  and  estimates  the  quantities  involved  in  the  parameter 
updating  on  the  fast  timescale.  We  analyzed  the  convergence  properties  of  our  algorithms  through 
techniques  from  stochastic  approximation,  and  illustrated  the  performance  of  our  algorithms  by 
comparing  with  two  state-of-the-art  model-based  simulation  optimization  methods. 
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3  Research  Output  from  AFOSR  support 


3.1  Publications 

1.  Enlu  Zhou*  and  Shalabh  Bhatnagar,  ’’Gradient-based  Adaptive  Stochastic  Search  for  Simulation 
Optimization  over  Continuous  Space” ,  under  3rd  round  review,  INFORMS  Journal  on  Computing. 

2.  Helin  Zhu,  Fan  Ye,  and  Enlu  Zhou*,  A  Regression  Approach  to  Dual  Problems  in  Dynamic 
Programming”,  conditionally  accepted,  IEEE  Transactions  on  Automatic  Control. 

3.  Joshua  Hale,  Enlu  Zhou*,  and  Jiming  Peng,  ”A  Lagrangian  Search  Method  for  the  P-Median 
Problem”,  to  appear,  Journal  of  Global  Optimization,  2016. 

4.  Xi  Chen  and  Enlu  Zhou*,  Population  Model-Based  Optimization,  Journal  of  Global  Optimiza¬ 
tion,  63(1),  pp.  125-148,  2015. 

5.  Fan  Ye  and  Enlu  Zhou*,  Duality  and  Information  Relaxation  in  Controlled  Markov  Diffusions, 
IEEE  Transactions  on  Automatic  Control,  60(10),  pp.  2676-2691,  2015. 

6.  Helin  Zhu,  Fan  Ye,  and  Enlu  Zhou*,  Fast  Estimation  of  True  Bounds  on  Bermudan  Option  Price 
under  Jump  Diffusion  Processes,  Quantitative  Finance,  15(11),  pp.  1885-1900,  2015. 

7.  Jiaqiao  Hu,  Enlu  Zhou*,  and  Qi  Fan,  Model-based  Optimization  Annealing  Random  Search 
with  Stochastic  Averaging,  ACM  Transactions  on  Modeling  and  Computer  Simulation,  24(4),  pp. 
21:1-21:23,  2014. 

8.  Enlu  Zhou*  and  Jiaqiao  Hu,  Gradient-based  Adaptive  Stochastic  Search  for  Non-differentiable 
Optimization,  IEEE  Transactions  of  Automatic  Control,  59(7),  pp.  1818-1832,  2014. 

9.  Fan  Ye  and  Enlu  Zhou*,  Optimal  Stopping  of  Partially  Observable  Markov  Processes:  A 
Filtering-based  Duality  Approach,  IEEE  Transactions  on  Automatic  Control,  58(10),  pp.  2698- 
2704,  2013. 

10.  Enlu  Zhou*,  Optimal  Stopping  under  Partial  Observation:  Near- Value  Iteration,  IEEE  Trans¬ 
actions  on  Automatic  Control,  58(2),  pp.  500-506,  2013. 

11.  Enlu  Zhou*,  Shalabh  Bhatnagar,  Xi  Chen,  ’’Simulation  Optimization  via  Gradient-based 
Stochastic  Search”,  in  Proceedings  of  the  2014  Winter  Simulation  Conference,  pp.  3869-3879,  2014. 

12.  Fan  Ye  and  Enlu  Zhou*,  Dual  Formulation  of  Controlled  Markov  Diffusions  and  Its  Applica¬ 
tion”,  in  Proceedings  of  the  19th  IFAC  World  Congress,  pp.  7811-7818,  2014. 

3.2  Software 

We  developed  a  user-friendly  software  based  on  our  algorithms  developed  under  GASS.  There  are 
two  versions  of  the  software:  one  is  an  application  in  Matlab 
tion.  Both  are  available  for  free  download  on  the  Pi’s  website 
The  website  also  includes  user  manual,  demos,  and  all  the  Matlab  code.  This  software  provides  an 
interactive  environment  to  solve  deterministic  optimization  problems  with  multiple  local  optima, 


,  and  the  other  is  a  standalone  applica- 
http://enluzhou.gatech.edu/software.html, 
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problems  with  little  structural  properties,  and  black-box  optimization  problems,  using  the  GASS 
method.  A  list  of  algorithms  provided  for  different  types  of  problems  is  shown  below. 


Continuous  Problem 

Discrete  Problem 

Unconstraint 

Continuous-GAS  S 

Continuous-GASS  with  Averaging 

Discrete  Annealing-GASS 

Linear  Equality 
Constraint 

Continuous-GASS  (Linear  Equality 

Constraint) 

Linear  Inequality 
Constraint 

Continuous-GASS  (Linear  Inequality 

Constraint) 

Box  Constraint 

Discrete  -GASS 

The  main  user  interface  of  the  software  is  shown  below. 


3.3  Students  Supported 

1.  Fan  Ye,  Ph.D.  graduated  in  2015,  Georgia  Institute  of  Technology,  now  Associate  at  Morgan 
Stanley. 

2.  Helin  Zhu,  Ph.D.  graduated  in  2016,  Georgia  Institute  of  Technology,  now  data  scientist  at  Uber. 

3.  Joshua  Hale,  Ph.D.  expected  in  2017,  Georgia  Institute  of  Technology. 
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